Fitting Primary Growth Models

Statistical Examination of Challenge Test Data and Modelling

2024-04-30

Objectives

  • You will learn how to:
    • Import a .csv file to R software
    • Check a challenge test data set
    • Select an adequate primary growth model
    • Fit a primary growth model to
    • Examine the fitting results
    • Obtain the confidence and prediction intervals
  • Materials needed:
    • This presentation
    • The R script: Unit1-ChallengeTestsDataAnalysisPGM.R
    • The accompanying video

Data

Challege test data

  • Growth data example
    • E. coli challenge test
      • data set: ecoli.csv
      • Experiments were carried out at 30 and 35 \(^o\)C
      • Two batches/repetitions per condition
  • First, we load the data set using the read.csv() function
df <- read.csv("../data/ecoli.csv", sep=";", header=TRUE)
  • Lets check the first lines of the file, for that use the head() function
head(df)
  Condition Repetition Time Temp   lnN
1         1          1    0   30  5.93
2         1          1    3   30  5.82
3         1          1    5   30  6.04
4         1          1    7   30  7.45
5         1          1    9   30 10.38
6         1          1   11   30 12.06

Challege test data

  • Check the structure of the data set using the str() function
  • We can always make sure that we have the right data set
str(df)
  • The data set has 42 observations (lines) and 5 variables(columns)
'data.frame':   42 obs. of  5 variables:
 $ Condition : int  1 1 1 1 1 1 1 1 1 1 ...
 $ Repetition: int  1 1 1 1 1 1 1 1 1 1 ...
 $ Time      : num  0 3 5 7 9 11 13 15 16 20 ...
 $ Temp      : int  30 30 30 30 30 30 30 30 30 30 ...
 $ lnN       : num  5.93 5.82 6.04 7.45 10.38 ...

Check the data structure

  • Check the levels of the variables: Condition and Temp
  • Use the table() function

Condition


 1  2 
22 20 

Temperature


30 35 
22 20 

Interaction Condition/Temp

   
    30 35
  1 22  0
  2  0 20

Check the data shape

  • Plot lnN against Time
library(ggplot2)
ggplot(data = df, aes(x=Time, y=lnN, group=factor(Temp), colour=factor(Temp))) +
  geom_point()

Check the data shape

  • The E. coli population (lnN) increases along Time
  • Clearly, the lnN shows a non-linear (sigmoidal) relationship with Time

Primary growth models

Primary growth models

  • Describe the microorganisms responses to the food environments

  • We can predict the growth behaviour of the microorganisms for specific environmental conditions, such as: pH, salt content, temperature, etc.

  • Primary growth models are characterised by the kinetic growth parameters

    • \(\lambda\): lag phase duration
    • \(k\): Maximum growth rate, or
    • \(\mu\): Specific growth rate
    • \(M\): Maximum population density

Primary growth models

Model fitting tools

Software

  • We use the R software
    • Its free
    • Developed for statistical analysis and plots
    • Functions to fit non-linear models to experimental data
      • nls() function from the```stats package
      • gsl_nls() from gslnls package (GNU Scientific Library)
      • Others
  • Download and install the R software: https://cran.r-project.org/
  • Install the predmicror package: https://fsqanalytics.github.io/predmicror/
devtools::install_github("fsqanalytics/predmicror")

Primary growth model

Selected model: Huang full model

\[Y = Y_0 +Y_{max} -log \left( e^{Y_0}+(e^{Y_{max}}-e^{Y_0}) \times e^{-mu \times B} \right)\]

Fitting procedure

Starting values

  • To fit non-linear models we need to we need to supply starting values for the model parameters
  • So, lets start by defining the starting values
start.values = list(Y0=0.0, Ymax=22.0, MUmax=1.7, lag=5.0) 

Fit using gsl_nls() function

  • Now we can fit the Huang model to the experimental data
  • Lets start with data from the Condition 1 & Repetition 1
library(predmicror)
library(gslnls)
fit <- gsl_nls(lnN ~ HuangFM(Time, Y0, Ymax, MUmax, lag),
               data=df[df$Condition==1 & df$Repetition==1, ],
               start =  start.values
               )
fit
Nonlinear regression model
  model: lnN ~ HuangFM(Time, Y0, Ymax, MUmax, lag)
   data: df[df$Condition == 1 & df$Repetition == 1, ]
    Y0   Ymax  MUmax    lag 
 5.918 18.840  1.182  5.584 
 residual sum-of-squares: 0.4724

Algorithm: multifit/levenberg-marquardt, (scaling: more, solver: qr)

Number of iterations to convergence: 5 
Achieved convergence tolerance: 8.214e-11

Check fitting results

  • The fitting was successful!
  • For a detailed summary of the model fit we can use the summary() function
summary(fit)

Formula: lnN ~ HuangFM(Time, Y0, Ymax, MUmax, lag)

Parameters:
      Estimate Std. Error t value Pr(>|t|)    
Y0     5.91795    0.15454   38.29 2.15e-09 ***
Ymax  18.83981    0.18286  103.03 2.14e-12 ***
MUmax  1.18173    0.03892   30.36 1.09e-08 ***
lag    5.58393    0.25000   22.34 9.12e-08 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.2598 on 7 degrees of freedom

Number of iterations to convergence: 5 
Achieved convergence tolerance: 8.214e-11

Parameters confidence intervals

  • Extract the confidence intervals of the parameters using the confint() function
confint(fit)
      2.5 %    97.5 %
1  5.552527  6.283381
2 18.407425 19.272192
3  1.089686  1.273772
4  4.992780  6.175070

Plot the fitted values

  • Define a vector with auxiliary time data
new.times=seq(0,24, by=0.1)
  • Use the predict() function to compute the prediction interval
fits <- predict(fit,
                newdata = data.frame(Time=new.times),
                interval = "prediction", level = 0.95)
str(fits)
 num [1:241, 1:3] 5.92 5.92 5.92 5.92 5.92 ...
 - attr(*, "dimnames")=List of 2
  ..$ : NULL
  ..$ : chr [1:3] "fit" "lwr" "upr"
  • Check the fits object
head(fits[, ])
          fit      lwr      upr
[1,] 5.917954 5.297986 6.537921
[2,] 5.917954 5.297986 6.537921
[3,] 5.917954 5.297986 6.537921
[4,] 5.917954 5.297986 6.537921
[5,] 5.917954 5.297986 6.537921
[6,] 5.917954 5.297986 6.537921

Plot confidence interval

  • Create a plot of the original data with the fitted values superimposed
  • Plot the observed data using the plot() function
  • Use the lines() function to add the confidence interval
plot(lnN ~ Time, data=df[df$Condition==1 & df$Repetition==1, ], ylim=c(5,20))
lines(new.times, fits[, 1], col="blue")
lines(new.times, fits[, 2], col="red")
lines(new.times, fits[, 3], col="red")

To start

Shiny app PredMicro: https://vcadavez.shinyapps.io/PredMicro/

References

Baranyi, J, PJ McClure, JP Sutherland, and TA Roberts. 1993. “Modeling Bacterial Growth Responses.” Journal of Industrial Microbiology 12 (3-5): 190–94.
Buchanan, R. L, R. C Whiting, and W. C Damert. 1997. “When Is Simple Good Enough: A Comparison of the Gompertz, Baranyi, and Three-Phase Linear Models for Fitting Bacterial Growth Curves.” Food Microbiology 14 (4): 313–26. https://doi.org/https://doi.org/10.1006/fmic.1997.0125.
Dolan, Kirk D., and Dharmendra K. Mishra. 2013. “Parameter Estimation in Food Science.” Annual Review of Food Science and Technology 4 (1): 401–22. https://doi.org/10.1146/annurev-food-022811-101247.
Gonzales-Barron, Ursula, and Vasco Cadavez. 2019. Handbook of Predictive Microbiology Growth Models Using R. Edited by Ursula Gonzales-Barron and Vasco Cadavez. Bragança, Portugal: Bringráfica Indústrias Gráficas, LDA.
Zwietering, M. H., I. Jongenburger, F. M. Rombouts, and K. van’t Riet. 1990. “Modeling of the Bacterial Growth Curve.” Applied and Environmental Microbiology 56 (6): 1875–81.